60
6
The Nature of Information
and much of science consists in determining them; thus, in a sense, “constraint” is
synonymous with “regularity.” Laws of nature are clearly constraints, and the very
existence of physical objects such as tables and aeroplanes, which have fewer degrees
of freedom than their constituent parts considered separately, is a manifestation of
constraint.
In this book we are particularly concerned with constraints applied to sequences.
Clearly, if a Markov process is in operation, the variety of the set of possible sequences
generated from a particular alphabet is smaller than it would be had successive
symbols been freely selected; that is, it is indeed “smaller than it might have been”.
“Might have been” requires the qualification, then, of “would have been if successive
symbols had been freely (or randomly—leaving the discussion of ‘randomness’ to
Chap. 11) selected”. We already know how to calculate the entropy (or information,
or Shannon index, or Shannon–Weaver index) upper II of a random sequence (Eq. 6.5);
there is a precise way of calculating the entropy per symbol for a Markov process
(see Sect. 11.2), and the reader may use the formula derived there to verify that the
entropy of a Markov process is less than that of a “perfectly random” process. Using
some of the terminology already introduced, we may expand on this statement to say
that the surprise occasioned by receiving a piece of information is lower if constraint
is operating; for example, when spelling out a word, it is practically superfluous to
say “u” after “q.”
The constraints affecting the choice of successive words are a manifestation of
the syntax of a language. 14 In the next chapter other ways in which constraint can
operate will be examined, but for now we can simply state that whenever constraint is
present, the entropy (of the set we are considering, hence of the information received
by selecting a member of that set) is lower than it would be for a perfectly random
selection from that set.
This maximum entropy (which, in physical systems, corresponds to the most
probable arrangement; i.e., to the macroscopic state that can be arranged in the
largest number of ways)—let us call it upper I Subscript normal m normal a normal xImax—allows us to define a relative entropy
upper I Subscript relIrel,
upper I Subscript rel Baseline equals StartFraction actual entropy Over upper I Subscript normal m normal a normal x Baseline EndFraction commaIrel = actual entropy
Imax
,
(6.17)
and a redundancy upper RR,
upper R equals 1 minus upper I Subscript rel Baseline periodR = 1 −Irel .
(6.18)
In a fascinating piece of work, Shannon (1951) established the entropy of English
essentially through empirical investigations using rooms full of people trying to guess
incomplete texts. 15
14 Animal communication is typically non-syntactic; the vast expressive power of human language
would be impossible without syntax, which could be thought of as the combination of discrete
components in, potentially, infinite ways. Nowak et al. (2000) have suggested that syntax could
only evolve if the number of discrete components exceeds a threshold.
15 Note that most computer languages lack redundancy—a single wrong character in a program
will usually cause the program to halt, or not compile.